Text Segmentation by

نویسندگان

  • TopicJay M. Ponte
  • W. Bruce Croft
چکیده

We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identiication in full-text databases. Researchers have tackled similar problems before but with diierent goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common making the problem challenging. We present a method for segmentation which makes use of a query expansion technique to nd common features for the topic segments. Experiments with the technique show that it can be eeective.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Directional Stroke Width Transform to Separate Text and Graphics in City Maps

One of the complex documents in the real world is city maps. In these kinds of maps, text labels overlap by graphics with having a variety of fonts and styles in different orientations. Usually, text and graphic colour is not predefined due to various map publishers. In most city maps, text and graphic lines form a single connected component. Moreover, the common regions of text and graphic lin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997